Skip to content

Add rhos-mcps support#124

Open
Akrog wants to merge 4 commits into
openstack-lightspeed:lcore-migrationfrom
Akrog:lcore-mcps
Open

Add rhos-mcps support#124
Akrog wants to merge 4 commits into
openstack-lightspeed:lcore-migrationfrom
Akrog:lcore-mcps

Conversation

@Akrog

@Akrog Akrog commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

This PR adds support for the rhos-mcps image that provides openstack-cli and openshift-cli tools.

The MCP server is deployed as a sidecar container with the openshift-cli tool always enabled and the openstack-cli tool only enabled once the OpenStackControlPlane is ready.

By default the rhos-mcps is not deployed and requires the rhos_mcps feature flag to be enabled.

Summary by CodeRabbit

  • New Features

    • Added support for optional MCP server integration, including new runtime configuration, related images, and deployment settings.
    • Exposed new status indicators so users can see MCP server readiness and credential-secret tracking.
    • Updated the sample configuration to include MCP-related options.
  • Bug Fixes

    • Improved reconciliation to react to supporting resource changes more reliably.
    • Expanded cleanup handling so MCP-related resources are removed during deletion.

Align with the version used by the openstack-k8s-operators.
@openshift-ci openshift-ci Bot requested review from lpiwowar and umago June 24, 2026 14:43
@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Akrog

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2e2a5b8a-d122-4b94-856e-6451a3f74428

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (3)
test/kuttl/tests/rhos-mcps-configuration/04-disable-rhos-mcps.yaml (1)

21-22: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add an explicit post-disable assertion for sidecar removal.

This step disables the flag, but this scenario only verifies mcp-config cleanup later. Add an assertion step that lightspeed-stack-deployment no longer includes rhos-mcp to prevent toggle regressions from passing unnoticed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/kuttl/tests/rhos-mcps-configuration/04-disable-rhos-mcps.yaml` around
lines 21 - 22, Add an explicit assertion after disabling the feature flag to
verify the sidecar is removed from the deployment. Update the KUTTL scenario
around the rhos-mcps toggle so it checks that lightspeed-stack-deployment no
longer contains rhos-mcp, in addition to the existing mcp-config cleanup
validation, to catch regressions in the disable path.
test/kuttl/common/mock-openstack/mock-keystone.yaml (1)

118-118: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Pin the mock image instead of using :latest.

A floating tag makes this KUTTL fixture nondeterministic and can break CI when the base image changes underneath the test. Please pin a specific tag or digest here so the mock environment stays reproducible.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/kuttl/common/mock-openstack/mock-keystone.yaml` at line 118, The mock
Keystone fixture is using a floating image tag, which makes the KUTTL test
environment nondeterministic. Update the image reference in the mock OpenStack
YAML to a pinned version or digest instead of
registry.redhat.io/ubi8/python-311:latest, keeping the mock setup reproducible
across CI runs.
test/kuttl/common/mock-openstack/mock-oscp-resources.yaml (1)

84-86: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Model OpenStackControlPlane readiness in the mock status.

The PR contract says MCP enablement waits for OpenStackControlPlane to become ready, but this fixture only provides status.tls.caBundleSecretName. As written, the KUTTL flow can pass without ever exercising the ready-gated path. Add a ready condition here, or split the scenario into pre-ready and post-ready assertions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/kuttl/common/mock-openstack/mock-oscp-resources.yaml` around lines 84 -
86, The mock OpenStackControlPlane status only sets
status.tls.caBundleSecretName, so the readiness-gated path in the MCP enablement
flow is never exercised. Update the mock status fixture used by
OpenStackControlPlane to include a ready condition in addition to the TLS
bundle, using the existing OpenStackControlPlane status shape so the KUTTL
scenario can assert both pre-ready and post-ready behavior if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/v1beta1/openstacklightspeed_types.go`:
- Around line 49-50: The MCP sidecar image is still defaulting to a mutable
“latest” tag, which means deployments can drift from the tested binary. Update
the default image constant in openstacklightspeed_types.go and ensure the same
pinned, immutable release image is used in manager.yaml and the generated CSV so
all install and upgrade paths reference the exact released MCP server image.

In `@cmd/main.go`:
- Around line 193-197: The `getDynamicWatchCRDs` call site still treats the
function as returning an error even though it never does, which triggers the
`unparam` lint issue. Update `getDynamicWatchCRDs` to remove the dead error
return, then simplify its callers in `main` and any other affected paths
(including the other mentioned block) to stop checking `err` and just use the
returned CRD value(s) directly.

In `@config/rbac/role.yaml`:
- Around line 15-26: The ClusterRole in role.yaml is granting broad cluster-wide
Secret write/read permissions that override the intended `resourceNames`
restriction for `pull-secret`. Tighten the `secrets` rule in this ClusterRole to
only the minimal cluster-scoped Secret access needed for MCP/Keystone, and move
any Secret create/update/patch/delete permissions to the namespaced Role used
for managed Secrets. Keep the existing `resourceNames`-based access for
`pull-secret` and verify the `ClusterRole` versus `Role` split preserves only
the required external reads.

In `@config/samples/api_v1beta1_openstacklightspeed.yaml`:
- Around line 18-21: The sample has a duplicate commented dev configuration
block that can lead users to create two dev keys when enabling both examples.
Update the OpenStack Lightspeed sample so the rhos_mcps option is folded into
the existing commented dev.featureFlags example rather than adding a second dev
section, using the dev and featureFlags sample entries to keep the structure
unambiguous.

In `@internal/controller/common.go`:
- Around line 292-301: GetDeploymentVolumeSection currently takes
appsv1.Deployment by value, so the returned *corev1.Volume points into a copied
Deployment rather than the caller’s object. Update GetDeploymentVolumeSection to
accept a *appsv1.Deployment (add a nil guard) and return the matching volume
from deployment.Spec.Template.Spec.Volumes so any caller mutation affects the
original Deployment.

In `@internal/controller/mcp_server.go`:
- Around line 794-820: The cleanup in mcp_server.go is deleting broadly named
ConfigMaps/Secrets without verifying they belong to this operator instance,
which can remove pre-existing or user-managed resources; update the deletion
logic around the MCP cleanup block and the related password Secret cleanup to
first check ownership via labels or controller metadata before calling
helper.GetClient().Delete. Use the existing resource symbols
MCPConfigYAMLConfigMapName, CloudsYAMLConfigMapName, SecureYAMLSecretName,
CombinedCABundleSecretName, and the lightspeed-password Secret cleanup path to
ensure only operator-managed objects are removed, especially for the
cross-namespace Secret where owner references may not be safe.
- Around line 514-516: The OpenStack MCP setup flow in mcp_server.go is treating
a Secret as ready even when AC_ID or AC_SECRET are missing, which can write
empty application-credential values into secure.yaml. Update the logic around
the return in the application credential handling path to verify both secret
keys exist and contain non-empty values before marking the secret ready or
returning success; if either key is missing, keep waiting/retrying instead of
proceeding.
- Around line 348-350: The password Secret lookup in mcp_server.go currently
treats any existing Secret as valid, even when LightspeedPasswordSecretName is
present without LightspeedPasswordSecretKey and
string(sec.Data[LightspeedPasswordSecretKey]) becomes empty. Update the
secret-handling logic in the password retrieval path to explicitly verify the
key exists and has a non-empty value before returning it; if the Secret is
missing the key, return an error instead of an empty password so the Keystone
user is not created or reconciled with a blank credential.

In `@internal/controller/openstacklightspeed_controller.go`:
- Around line 162-168: The polling in the OpenStack control flow stops too early
for cross-namespace instances, so readiness and endpoint/CA updates can be
missed after discovery. Update the logic around the `oscpWatch` check in
`openstacklightspeed_controller.go` so `RequeueAfter` continues until a real
cross-namespace watch is established, or add watch coverage for
`OpenStackControlPlane` changes outside the operator namespace; ensure the
reconcile path that handles `OpenStackReady` still requeues or watches the
relevant `OpenStackControlPlane` updates.
- Around line 479-518: Keep the DynamicWatchCRD flag latched once a watch has
been registered in WatchDynamicCRD. The current Reconcile-driven loop in
openstacklightspeed_controller.go clears seen when IsCRDEstablished reports the
CRD is gone, which can allow r.controller.Watch(...) to be registered again
later even though there is no unregister path. Update the logic around the
DynamicWatchCRD/seen handling so a successful watch registration stays marked
true and does not attempt duplicate setup on subsequent reconciles.

In `@test/kuttl/tests/application-credentials/04-assert-mock-openstack.yaml`:
- Around line 4-15: The step timeout is too tight for the commands in the mock
OpenStack assertion script, which can cause flakes when the waits run close to
their limits. Increase the timeout in the test step so the combined `oc wait`
and `oc get` operations in the `commands` block have enough headroom; update the
`timeout` setting in this KUTTL test manifest accordingly.

In `@test/kuttl/tests/application-credentials/06-assert-ac-resources.yaml`:
- Around line 10-42: The wait loops in the application-credentials assertion
script can take longer than the step timeout, so the test may be terminated
before finishing its checks. Update the timeout setting in the same script to
exceed the worst-case combined wait time used by the secret and AC CR polling
loops, and keep the timeout aligned with the logic in the shell block that waits
for lightspeed-password and the
keystoneapplicationcredentials.keystone.openstack.org resource.

In `@test/kuttl/tests/application-credentials/08-assert-mcp-credentials.yaml`:
- Around line 42-64: The checks in the application-credentials KUTTL assertion
script read `openstack-config-secret` and `mcp-config` only once, which can race
with resource availability. Update the verification flow in this test to poll
with retries and short waits before asserting on `SECURE` and `MCP_CFG`,
re-fetching the Secret and ConfigMap until the expected keys are present or a
timeout is reached. Keep the existing `oc get`/`grep` assertions in place, but
wrap them in retry logic so the `clouds.yaml`-driven readiness is handled
reliably.

In `@test/kuttl/tests/application-credentials/12-cleanup-mock-openstack.yaml`:
- Around line 13-15: The teardown in the cleanup step uses a fixed 60-second
wait for namespace deletion, which can make the scenario flaky on slower
clusters. Update the cleanup logic around the namespace deletion command in the
test to avoid relying on a hard `--timeout=60s`; instead use delete plus polling
for completion, or increase it to a safer bounded wait before removing the CRDs.

In
`@test/kuttl/tests/persistent-database/04-assert-openstack-lightspeed-instance.yaml`:
- Around line 51-52: The OpenStackLightspeedMCPServerReady assertion only checks
for a True status, so it can pass even when the MCP is enabled through an
unintended path. Update the existing assertion in this persistent-database test
to also verify the disabled message, matching the other suite checks, so the
condition is specifically validating the expected disabled-state behavior rather
than any True result.

---

Nitpick comments:
In `@test/kuttl/common/mock-openstack/mock-keystone.yaml`:
- Line 118: The mock Keystone fixture is using a floating image tag, which makes
the KUTTL test environment nondeterministic. Update the image reference in the
mock OpenStack YAML to a pinned version or digest instead of
registry.redhat.io/ubi8/python-311:latest, keeping the mock setup reproducible
across CI runs.

In `@test/kuttl/common/mock-openstack/mock-oscp-resources.yaml`:
- Around line 84-86: The mock OpenStackControlPlane status only sets
status.tls.caBundleSecretName, so the readiness-gated path in the MCP enablement
flow is never exercised. Update the mock status fixture used by
OpenStackControlPlane to include a ready condition in addition to the TLS
bundle, using the existing OpenStackControlPlane status shape so the KUTTL
scenario can assert both pre-ready and post-ready behavior if needed.

In `@test/kuttl/tests/rhos-mcps-configuration/04-disable-rhos-mcps.yaml`:
- Around line 21-22: Add an explicit assertion after disabling the feature flag
to verify the sidecar is removed from the deployment. Update the KUTTL scenario
around the rhos-mcps toggle so it checks that lightspeed-stack-deployment no
longer contains rhos-mcp, in addition to the existing mcp-config cleanup
validation, to catch regressions in the disable path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ecbaef71-33aa-44a9-a9ab-3a2bd4b1ea79

📥 Commits

Reviewing files that changed from the base of the PR and between 6a09734 and f0918cb.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (71)
  • .github/workflows/build-and-push.yaml
  • .github/workflows/verify-generation.yaml
  • Makefile
  • api/v1beta1/conditions.go
  • api/v1beta1/openstacklightspeed_types.go
  • bundle.Dockerfile
  • bundle/manifests/lightspeed.openstack.org_openstacklightspeeds.yaml
  • bundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yaml
  • bundle/metadata/annotations.yaml
  • cmd/main.go
  • config/crd/bases/lightspeed.openstack.org_openstacklightspeeds.yaml
  • config/manager/manager.yaml
  • config/manifests/bases/openstack-lightspeed-operator.clusterserviceversion.yaml
  • config/rbac/role.yaml
  • config/samples/api_v1beta1_openstacklightspeed.yaml
  • go.mod
  • internal/controller/assets/mcp_server_config.yaml.tmpl
  • internal/controller/common.go
  • internal/controller/constants.go
  • internal/controller/lcore_config.go
  • internal/controller/lcore_deployment.go
  • internal/controller/mcp_server.go
  • internal/controller/mcp_server_test.go
  • internal/controller/openstacklightspeed_controller.go
  • internal/controller/openstacklightspeed_controller_test.go
  • internal/controller/postgres_reconciler.go
  • test/kuttl/common/expected-configs/lightspeed-stack-okp.yaml
  • test/kuttl/common/expected-configs/lightspeed-stack-update.yaml
  • test/kuttl/common/expected-configs/lightspeed-stack.yaml
  • test/kuttl/common/mock-openstack/assert-mock-openstack.yaml
  • test/kuttl/common/mock-openstack/cleanup-mock-openstack.yaml
  • test/kuttl/common/mock-openstack/errors-mock-openstack.yaml
  • test/kuttl/common/mock-openstack/kac-crd.yaml
  • test/kuttl/common/mock-openstack/mock-keystone.yaml
  • test/kuttl/common/mock-openstack/mock-oscp-resources.yaml
  • test/kuttl/common/mock-openstack/oscp-crd.yaml
  • test/kuttl/common/mock-openstack/simulate-keystone-operator.sh
  • test/kuttl/common/openstack-lightspeed-instance/assert-lightspeed-stack-config.yaml
  • test/kuttl/common/openstack-lightspeed-instance/assert-mcp-config.yaml
  • test/kuttl/common/openstack-lightspeed-instance/assert-openstack-lightspeed-instance.yaml
  • test/kuttl/common/openstack-lightspeed-instance/errors-openstack-lightspeed-instance.yaml
  • test/kuttl/tests/application-credentials/00-mock-resources.yaml
  • test/kuttl/tests/application-credentials/01-assert-mock-objects-created.yaml
  • test/kuttl/tests/application-credentials/02-create-openstack-crds.yaml
  • test/kuttl/tests/application-credentials/03-create-mock-openstack.yaml
  • test/kuttl/tests/application-credentials/04-assert-mock-openstack.yaml
  • test/kuttl/tests/application-credentials/05-create-openstack-lightspeed-instance.yaml
  • test/kuttl/tests/application-credentials/06-assert-ac-resources.yaml
  • test/kuttl/tests/application-credentials/07-simulate-keystone-operator.yaml
  • test/kuttl/tests/application-credentials/08-assert-mcp-credentials.yaml
  • test/kuttl/tests/application-credentials/09-cleanup-openstack-lightspeed-instance.yaml
  • test/kuttl/tests/application-credentials/10-assert-ac-cleanup.yaml
  • test/kuttl/tests/application-credentials/11-errors-openstack-lightspeed-instance.yaml
  • test/kuttl/tests/application-credentials/12-cleanup-mock-openstack.yaml
  • test/kuttl/tests/application-credentials/13-assert-mock-openstack.yaml
  • test/kuttl/tests/application-credentials/14-cleanup-mock-objects.yaml
  • test/kuttl/tests/application-credentials/15-errors-mock-objects.yaml
  • test/kuttl/tests/okp-configuration/03-assert-okp-instance.yaml
  • test/kuttl/tests/persistent-database/04-assert-openstack-lightspeed-instance.yaml
  • test/kuttl/tests/rhos-mcps-configuration/00-mock-resources.yaml
  • test/kuttl/tests/rhos-mcps-configuration/01-assert-mock-objects-created.yaml
  • test/kuttl/tests/rhos-mcps-configuration/02-create-rhos-mcps-resources.yaml
  • test/kuttl/tests/rhos-mcps-configuration/03-assert-rhos-mcps-instance.yaml
  • test/kuttl/tests/rhos-mcps-configuration/04-disable-rhos-mcps.yaml
  • test/kuttl/tests/rhos-mcps-configuration/05-errors-rhos-mcps-cleanup.yaml
  • test/kuttl/tests/rhos-mcps-configuration/06-cleanup-openstack-lightspeed-instance.yaml
  • test/kuttl/tests/rhos-mcps-configuration/07-errors-openstack-lightspeed-instance.yaml
  • test/kuttl/tests/rhos-mcps-configuration/08-cleanup-mock-objects.yaml
  • test/kuttl/tests/rhos-mcps-configuration/09-errors-mock-objects.yaml
  • test/kuttl/tests/update-openstacklightspeed/08-assert-openstacklightspeed-update.yaml
  • test/kuttl/tests/update-openstacklightspeed/11-assert-configmaps-update.yaml

Comment on lines +49 to +50
// MCPServerContainerImage is the fall-back container image for the MCP server
MCPServerContainerImage = "quay.io/openstack-lightspeed/rhos-mcps:latest"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

sed -n '1,120p' api/v1beta1/openstacklightspeed_types.go && printf '\n---\n' && sed -n '300,360p' api/v1beta1/openstacklightspeed_types.go

Repository: openstack-lightspeed/operator

Length of output: 6783


🏁 Script executed:

rg -n "MCPServerContainerImage|RELATED_IMAGE_MCP_SERVER_IMAGE_URL_DEFAULT|latest|rhos-mcps" api/v1beta1/openstacklightspeed_types.go api -S

Repository: openstack-lightspeed/operator

Length of output: 2317


🏁 Script executed:

git diff --stat -- api/v1beta1/openstacklightspeed_types.go && printf '\n---\n' && git diff -- api/v1beta1/openstacklightspeed_types.go --unified=20

Repository: openstack-lightspeed/operator

Length of output: 172


🏁 Script executed:

rg -n "RELATED_IMAGE_MCP_SERVER_IMAGE_URL_DEFAULT|MCPServerImageURL|MCPServerContainerImage|rhos-mcps" -S .

Repository: openstack-lightspeed/operator

Length of output: 1449


🏁 Script executed:

rg -n "RELATED_IMAGE_.*IMAGE_URL_DEFAULT|RELATED_IMAGE_.*DEFAULT" -S config bundle manifests deploy api controllers | sed -n '1,220p'

Repository: openstack-lightspeed/operator

Length of output: 3145


🏁 Script executed:

fd -a -t f 'csv|yaml|yml' config bundle manifests deploy . | sed -n '1,200p'

Repository: openstack-lightspeed/operator

Length of output: 10746


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== related image refs =="
rg -n "RELATED_IMAGE_MCP_SERVER_IMAGE_URL_DEFAULT|RELATED_IMAGE_.*IMAGE_URL_DEFAULT" config bundle manifests deploy api controllers -S || true

echo
echo "== candidate manifest files =="
fd -a -t f 'csv|yaml|yml' config bundle manifests deploy . | sed -n '1,200p'

Repository: openstack-lightspeed/operator

Length of output: 13781


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== related image refs =="
rg -n "RELATED_IMAGE_MCP_SERVER_IMAGE_URL_DEFAULT|RELATED_IMAGE_.*IMAGE_URL_DEFAULT" config bundle manifests deploy api controllers -S || true

echo
echo "== manifest files =="
git ls-files 'config/**' 'bundle/**' 'manifests/**' 'deploy/**' | sed -n '1,200p'

Repository: openstack-lightspeed/operator

Length of output: 5126


Pin the MCP sidecar image to an immutable release.

quay.io/openstack-lightspeed/rhos-mcps:latest is still used as the default in api/v1beta1/openstacklightspeed_types.go, and the same tag is baked into config/manager/manager.yaml and the generated CSV. That makes installs and upgrades pick up whatever image is current at deploy time instead of the tested MCP binary.

Suggested change
-	MCPServerContainerImage = "quay.io/openstack-lightspeed/rhos-mcps:latest"
+	MCPServerContainerImage = "quay.io/openstack-lightspeed/rhos-mcps:<release-or-digest>"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/v1beta1/openstacklightspeed_types.go` around lines 49 - 50, The MCP
sidecar image is still defaulting to a mutable “latest” tag, which means
deployments can drift from the tested binary. Update the default image constant
in openstacklightspeed_types.go and ensure the same pinned, immutable release
image is used in manager.yaml and the generated CSV so all install and upgrade
paths reference the exact released MCP server image.

Comment thread cmd/main.go Outdated
Comment on lines +193 to +197
dynamicWatchCRDs, err := getDynamicWatchCRDs()
if err != nil {
setupLog.Error(err, "unable to retrieve DynamicWatchCRDs")
os.Exit(1)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Drop the dead error return from getDynamicWatchCRDs.

getDynamicWatchCRDs never returns a non-nil error, and golangci-lint flags this as unparam; this can block CI until the signature and caller are simplified.

Proposed fix
-	dynamicWatchCRDs, err := getDynamicWatchCRDs()
-	if err != nil {
-		setupLog.Error(err, "unable to retrieve DynamicWatchCRDs")
-		os.Exit(1)
-	}
+	dynamicWatchCRDs := getDynamicWatchCRDs()
-func getDynamicWatchCRDs() (map[schema.GroupVersionKind]*atomic.Bool, error) {
+func getDynamicWatchCRDs() map[schema.GroupVersionKind]*atomic.Bool {
 	dynamicWatchCRDs := map[schema.GroupVersionKind]*atomic.Bool{
 		{
 			Group:   "core.openstack.org",
@@
 	}
 
-	return dynamicWatchCRDs, nil
+	return dynamicWatchCRDs
 }

Also applies to: 251-266

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/main.go` around lines 193 - 197, The `getDynamicWatchCRDs` call site
still treats the function as returning an error even though it never does, which
triggers the `unparam` lint issue. Update `getDynamicWatchCRDs` to remove the
dead error return, then simplify its callers in `main` and any other affected
paths (including the other mentioned block) to stop checking `err` and just use
the returned CRD value(s) directly.

Source: Linters/SAST tools

Comment thread config/rbac/role.yaml
Comment on lines +15 to +26
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch

@coderabbitai coderabbitai Bot Jun 24, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Narrow cluster-scoped Secret access.

This ClusterRole now grants create/delete/list/watch/update/patch on every Secret in the cluster, making the pull-secret resourceNames rule below ineffective. Keep cluster scope to the minimum external Secret reads needed for MCP/Keystone, and rely on the namespaced Role for managed Secret writes.

Suggested direction
 - apiGroups:
   - ""
   resources:
   - secrets
   verbs:
-  - create
-  - delete
   - get
-  - list
-  - patch
-  - update
-  - watch
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
🧰 Tools
🪛 Trivy (0.69.3)

[error] 15-26: Manage secrets

ClusterRole 'manager-role' shouldn't have access to manage resource 'secrets'

Rule: KSV-0041

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/rbac/role.yaml` around lines 15 - 26, The ClusterRole in role.yaml is
granting broad cluster-wide Secret write/read permissions that override the
intended `resourceNames` restriction for `pull-secret`. Tighten the `secrets`
rule in this ClusterRole to only the minimal cluster-scoped Secret access needed
for MCP/Keystone, and move any Secret create/update/patch/delete permissions to
the namespaced Role used for managed Secrets. Keep the existing
`resourceNames`-based access for `pull-secret` and verify the `ClusterRole`
versus `Role` split preserves only the required external reads.

Source: Linters/SAST tools

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is important for us to address. Not sure we want the operator to read/write every secret in the cluster

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Outline: internal/controller/common.go =="
ast-grep outline internal/controller/common.go --view expanded || true

echo
echo "== Outline: internal/controller/mcp_server.go =="
ast-grep outline internal/controller/mcp_server.go --view expanded || true

echo
echo "== Secret/ConfigMap API usage in controller code =="
rg -n -C2 --glob 'internal/controller/*.go' '\b(Get|List|CreateOrPatch|Delete|Patch|Update)\b|Secret\b|ConfigMap\b' internal/controller

echo
echo "== Focus: common.go relevant slice =="
sed -n '1,260p' internal/controller/common.go

echo
echo "== Focus: mcp_server.go relevant slices =="
sed -n '1,260p' internal/controller/mcp_server.go
echo
sed -n '260,620p' internal/controller/mcp_server.go
echo
sed -n '620,980p' internal/controller/mcp_server.go

echo
echo "== Focus: openstacklightspeed_controller.go relevant slices =="
sed -n '1,260p' internal/controller/openstacklightspeed_controller.go

Length of output: 161955



@umago, I re-checked the controller code and agree this is still an important issue to address.

One nuance though: the current MCP flow does need some cross-namespace Secret access in the OpenStackControlPlane namespace, so my earlier “reduce it to get only” direction was too aggressive.

What I found:

  • ensurePasswordSecret() creates lightspeed-password in the OSCP namespace.
  • reconcileACSecret() updates finalizers on the current/rotated AC Secret in that namespace.
  • reconcileDeleteOpenStackResources() removes that finalizer and deletes lightspeed-password.
  • The rest of the external Secret usage is read-only.

So the concern is valid, but the tighter fix is:

  • keep the namespaced Role for operator-owned Secrets in openstack-lightspeed
  • narrow the cluster-scoped secrets rule to only the verbs actually used cross-namespace
    • likely get, create, update, delete
  • drop cluster-wide list, watch, and patch for Secrets unless there’s another caller not in this PR

Also worth checking separately: the cluster-scoped configmaps rule looks read-only from current usage, so that one may be reducible too.

If the goal is zero cluster-wide Secret writes, that would require a code-path redesign, because today the reconciler explicitly mutates Secrets in the OSCP namespace.

Comment on lines +18 to +21
# Uncomment to enable RHOS MCPs (MCP server sidecar with OpenStack/OpenShift tools):
# dev:
# featureFlags:
# - rhos_mcps

@coderabbitai coderabbitai Bot Jun 24, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Avoid a second commented dev: block in the sample.

If users enable both RHOS MCPs and OKP by uncommenting both examples, the sample ends up with duplicate dev keys, so one block gets dropped. Fold rhos_mcps into the existing commented dev.featureFlags example instead.

Suggested change
-  # Uncomment to enable RHOS MCPs (MCP server sidecar with OpenStack/OpenShift tools):
-  # dev:
-  #   featureFlags:
-  #     - rhos_mcps
+  # Uncomment to enable feature flags:
+  # dev:
+  #   featureFlags:
+  #     - rhos_mcps
   # Uncomment to enable OKP (Offline Knowledge Portal) as an Inline RAG source:
   # okp:
   #   accessKey: okp-access-key-secret
-  # dev:
-  #   featureFlags:
-  #     - okp
+  #   featureFlags:
+  #     - okp
   #   okpChunkFilterQuery: "product:(*openstack* OR *openshift*)"
   #   okpRagOnly: true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/samples/api_v1beta1_openstacklightspeed.yaml` around lines 18 - 21,
The sample has a duplicate commented dev configuration block that can lead users
to create two dev keys when enabling both examples. Update the OpenStack
Lightspeed sample so the rhos_mcps option is folded into the existing commented
dev.featureFlags example rather than adding a second dev section, using the dev
and featureFlags sample entries to keep the structure unambiguous.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umago Sounds good — consolidating this into a single commented dev.featureFlags example will keep the sample unambiguous when both RHOS MCPs and OKP are enabled.

🐇👍

Comment on lines +292 to +301
// GetDeploymentVolumeSection returns a pointer to the Volume in the Deployment's PodSpec
// whose name matches the given volumeSectionName. If no such volume is found, it returns nil.
func GetDeploymentVolumeSection(deployment appsv1.Deployment, volumeSectionName string) *corev1.Volume {
for i, volume := range deployment.Spec.Template.Spec.Volumes {
if volume.Name == volumeSectionName {
return &deployment.Spec.Template.Spec.Volumes[i]
}
}
return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Inspect all GetDeploymentVolumeSection call sites after changing the helper to accept *appsv1.Deployment.
rg -n -C3 --type=go '\bGetDeploymentVolumeSection\s*\(' .

Repository: openstack-lightspeed/operator

Length of output: 866


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find all references to the helper and inspect the surrounding code.
rg -n -C 4 '\bGetDeploymentVolumeSection\s*\(' internal .

Repository: openstack-lightspeed/operator

Length of output: 1706


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find all references to the helper and inspect the surrounding code.
rg -n -C 4 '\bGetDeploymentVolumeSection\s*\(' internal .

Repository: openstack-lightspeed/operator

Length of output: 1706


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '240,330p' internal/controller/common.go

Repository: openstack-lightspeed/operator

Length of output: 3229


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '240,330p' internal/controller/common.go

Repository: openstack-lightspeed/operator

Length of output: 3229


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '240,330p' internal/controller/common.go

Repository: openstack-lightspeed/operator

Length of output: 3229


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for all references to the helper and any direct mutation of its result.
rg -n '\bGetDeploymentVolumeSection\b|GetDeploymentVolumeSection\s*\(' internal .

Repository: openstack-lightspeed/operator

Length of output: 697


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for all references to the helper and any direct mutation of its result.
rg -n '\bGetDeploymentVolumeSection\b|GetDeploymentVolumeSection\s*\(' internal .

Repository: openstack-lightspeed/operator

Length of output: 697


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for all references to the helper and any direct mutation of its result.
rg -n '\bGetDeploymentVolumeSection\b|GetDeploymentVolumeSection\s*\(' internal .

Repository: openstack-lightspeed/operator

Length of output: 697


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for all references to the helper and any direct mutation of its result.
rg -n '\bGetDeploymentVolumeSection\b|GetDeploymentVolumeSection\s*\(' internal .

Repository: openstack-lightspeed/operator

Length of output: 697


Return a pointer into the caller’s Deployment, not a copy. GetDeploymentVolumeSection takes appsv1.Deployment by value, so the returned *corev1.Volume points into a copied struct. If callers need to mutate the volume, accept *appsv1.Deployment instead (with a nil guard).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/common.go` around lines 292 - 301,
GetDeploymentVolumeSection currently takes appsv1.Deployment by value, so the
returned *corev1.Volume points into a copied Deployment rather than the caller’s
object. Update GetDeploymentVolumeSection to accept a *appsv1.Deployment (add a
nil guard) and return the matching volume from
deployment.Spec.Template.Spec.Volumes so any caller mutation affects the
original Deployment.

Comment on lines +4 to +15
timeout: 120
commands:
- script: |
#!/bin/bash
set -euo pipefail
oc wait --for=condition=Established \
crd/openstackcontrolplanes.core.openstack.org \
crd/keystoneapplicationcredentials.keystone.openstack.org \
--timeout=60s
oc get namespace openstack
oc wait --for=condition=Ready pod/mock-keystone-server -n openstack --timeout=60s
oc get openstackcontrolplane openstack-galera-network-isolation -n openstack

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Increase step timeout to avoid edge-timeout flakes.

Line 4 sets timeout: 120, while Lines 9-12 and Line 14 can already consume up to 120s combined, so command overhead can trigger premature step timeout.

Suggested fix
-kind: TestAssert
-timeout: 120
+kind: TestAssert
+timeout: 180
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
timeout: 120
commands:
- script: |
#!/bin/bash
set -euo pipefail
oc wait --for=condition=Established \
crd/openstackcontrolplanes.core.openstack.org \
crd/keystoneapplicationcredentials.keystone.openstack.org \
--timeout=60s
oc get namespace openstack
oc wait --for=condition=Ready pod/mock-keystone-server -n openstack --timeout=60s
oc get openstackcontrolplane openstack-galera-network-isolation -n openstack
timeout: 180
commands:
- script: |
#!/bin/bash
set -euo pipefail
oc wait --for=condition=Established \
crd/openstackcontrolplanes.core.openstack.org \
crd/keystoneapplicationcredentials.keystone.openstack.org \
--timeout=60s
oc get namespace openstack
oc wait --for=condition=Ready pod/mock-keystone-server -n openstack --timeout=60s
oc get openstackcontrolplane openstack-galera-network-isolation -n openstack
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/kuttl/tests/application-credentials/04-assert-mock-openstack.yaml`
around lines 4 - 15, The step timeout is too tight for the commands in the mock
OpenStack assertion script, which can cause flakes when the waits run close to
their limits. Increase the timeout in the test step so the combined `oc wait`
and `oc get` operations in the `commands` block have enough headroom; update the
`timeout` setting in this KUTTL test manifest accordingly.

Comment on lines +10 to +42
for i in $(seq 1 60); do
if oc get secret lightspeed-password -n openstack 2>/dev/null; then
echo "Password secret found"
break
fi
if [ "$i" -eq 60 ]; then
echo "ERROR: Password secret not created after 120s"
exit 1
fi
sleep 2
done

PW=$(oc get secret lightspeed-password -n openstack \
-o jsonpath='{.data.password}' | base64 -d)
if [ -z "$PW" ]; then
echo "ERROR: Password secret has no password key"
exit 1
fi
echo "Password secret has password key (length: ${#PW})"

echo "Waiting for AC CR to be created..."
for i in $(seq 1 60); do
if oc get keystoneapplicationcredentials.keystone.openstack.org lightspeed \
-n openstack 2>/dev/null; then
echo "AC CR found"
break
fi
if [ "$i" -eq 60 ]; then
echo "ERROR: AC CR not created after 120s"
exit 1
fi
sleep 2
done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Command timeout is lower than the script’s worst-case wait path.

Lines 10-20 and 31-42 can wait 240s total, but Line 67 sets timeout to 180s, so the step can be killed before the script completes its intended checks.

Suggested fix
-    timeout: 180
+    timeout: 300

Also applies to: 67-67

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/kuttl/tests/application-credentials/06-assert-ac-resources.yaml` around
lines 10 - 42, The wait loops in the application-credentials assertion script
can take longer than the step timeout, so the test may be terminated before
finishing its checks. Update the timeout setting in the same script to exceed
the worst-case combined wait time used by the secret and AC CR polling loops,
and keep the timeout aligned with the logic in the shell block that waits for
lightspeed-password and the
keystoneapplicationcredentials.keystone.openstack.org resource.

Comment on lines +42 to +64
# Verify openstack-config-secret Secret
SECURE=$(oc get secret openstack-config-secret -n openstack-lightspeed \
-o jsonpath='{.data.secure\.yaml}' | base64 -d)
if ! echo "$SECURE" | grep -q "application_credential_secret: mock-ac-secret-abcde"; then
echo "ERROR: secure.yaml missing application_credential_secret"
echo "$SECURE"
exit 1
fi
echo "secure.yaml has correct application_credential_secret"

# Verify MCP config has OpenStack enabled
MCP_CFG=$(oc get configmap mcp-config -n openstack-lightspeed \
-o jsonpath='{.data.config\.yaml}')
if ! echo "$MCP_CFG" | grep -qP "openstack:\s*\n\s+enabled:\s+true"; then
# Try with a simpler grep
ENABLED=$(echo "$MCP_CFG" | grep -A1 "openstack:" | grep "enabled:" | tr -d ' ')
if [ "$ENABLED" != "enabled:true" ]; then
echo "ERROR: MCP config does not have openstack enabled"
echo "$MCP_CFG"
exit 1
fi
fi
echo "MCP config has OpenStack enabled"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Add readiness polling for dependent MCP artifacts.

After detecting clouds.yaml, the script reads openstack-config-secret and mcp-config only once. Those resources can lag and cause intermittent failures; poll them with retries before content assertions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/kuttl/tests/application-credentials/08-assert-mcp-credentials.yaml`
around lines 42 - 64, The checks in the application-credentials KUTTL assertion
script read `openstack-config-secret` and `mcp-config` only once, which can race
with resource availability. Update the verification flow in this test to poll
with retries and short waits before asserting on `SECURE` and `MCP_CFG`,
re-fetching the Secret and ConfigMap until the expected keys are present or a
timeout is reached. Keep the existing `oc get`/`grep` assertions in place, but
wrap them in retry logic so the `clouds.yaml`-driven readiness is handled
reliably.

Comment on lines +13 to +15
oc delete namespace openstack --ignore-not-found=true --timeout=60s
oc delete crd openstackcontrolplanes.core.openstack.org keystoneapplicationcredentials.keystone.openstack.org --ignore-not-found=true
timeout: 120

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Avoid a fixed 60s namespace-delete wait in teardown.

The hard --timeout=60s can fail on slower clusters while cleanup is still progressing, making this scenario flaky. Prefer delete+poll (or a larger bounded wait) before proceeding.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/kuttl/tests/application-credentials/12-cleanup-mock-openstack.yaml`
around lines 13 - 15, The teardown in the cleanup step uses a fixed 60-second
wait for namespace deletion, which can make the scenario flaky on slower
clusters. Update the cleanup logic around the namespace deletion command in the
test to avoid relying on a hard `--timeout=60s`; instead use delete plus polling
for completion, or increase it to a safer bounded wait before removing the CRDs.

Comment on lines +51 to +52
- type: OpenStackLightspeedMCPServerReady
status: "True"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert the MCP disabled message here as well.

Line 51 only checks that the condition is True. In this suite rhos_mcps is still off by default, so the test should also verify the disabled message; otherwise a different True path would still pass.

Suggested change
     - type: OpenStackLightspeedMCPServerReady
       status: "True"
+      message: "RHOS MCP server is disabled (rhos_mcps feature flag not set)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- type: OpenStackLightspeedMCPServerReady
status: "True"
- type: OpenStackLightspeedMCPServerReady
status: "True"
message: "RHOS MCP server is disabled (rhos_mcps feature flag not set)"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/kuttl/tests/persistent-database/04-assert-openstack-lightspeed-instance.yaml`
around lines 51 - 52, The OpenStackLightspeedMCPServerReady assertion only
checks for a True status, so it can pass even when the MCP is enabled through an
unintended path. Update the existing assertion in this persistent-database test
to also verify the disabled message, matching the other suite checks, so the
condition is specifically validating the expected disabled-state behavior rather
than any True result.

Akrog and others added 3 commits June 25, 2026 10:58
With this commit we start deploying the MCP server as a sidecar
container of the lightspeed-service container.

The MCP server deployed is the one from our own rhos-mcps repository
[1].

On installation the `openshift-cli` tool is enabled when OpenStack
Lightspeed is configured, and the `openstack-cli` tool is enabled when
the `OpenStackControlPlane` is ready.

From a security perspective since we are deploying the MCP tools in the
pod's network namespace not using TLS is not a real security risk.

[1]: https://github.com/openstack-lightspeed/rhos-mcps

Jira: OSPRH-27075
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
For now we don't want to automatically deploy the MCP tools, because:

- Releasing the rhos-mcps image may be done at a later time.
- This is an experimental feature right now, so we want it disabled by
  default.

The name of the feature flag is `rhos_mcps` as shown in the config
ample.

Jira: OSPRH-27075
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Initial implementation of the MCP deployment uses the credentials from
the `openstackclient` pod, which means that we are not the owners of
that secret, just the copy we make in the `openstack-lightspeed`
namespace, so those credentials could be removed/deleted and that would
break our `openstack-cli` tool.

In this patch we change the credentials and we leverage the
`KeystoneApplicationCredential` CR to get our own credentials.

Credential Rotation is handled by the code as well.

Jira: OSPRH-27075
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@umago umago left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking but I think there are some comments (specially the ones from coderabbit) that needs to be addressed

Comment thread config/rbac/role.yaml
Comment on lines +15 to +26
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is important for us to address. Not sure we want the operator to read/write every secret in the cluster

Comment on lines +18 to +21
# Uncomment to enable RHOS MCPs (MCP server sidecar with OpenStack/OpenShift tools):
# dev:
# featureFlags:
# - rhos_mcps

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +479 to +518
for gvk, seen := range r.DynamicWatchCRD {
if seen.Load() {
// Re-verify CRD still exists — it may have been uninstalled.
crdAvailable, err := IsCRDEstablished(ctx, helper, gvk)
if err != nil {
return err
}
if !crdAvailable {
seen.Store(false)
}
continue
}

crdAvailable, err := IsCRDEstablished(ctx, helper, gvk)
if err != nil {
return err
}

if !crdAvailable {
continue
}

GVKUnstructObj := &uns.Unstructured{}
GVKUnstructObj.SetGroupVersionKind(gvk)
err = r.controller.Watch(
source.Kind(
r.Cache,
GVKUnstructObj,
handler.TypedEnqueueRequestsFromMapFunc(func(ctx context.Context, o *uns.Unstructured) []ctrl.Request {
return r.NotifyAllOpenStackLightspeeds(ctx, o)
}),
predicate.TypedResourceVersionChangedPredicate[*uns.Unstructured]{},
),
)
if err != nil {
return fmt.Errorf("failed to set up watch for %s: %w", GetCRDName(gvk), err)
}

seen.Store(true)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Stale watches are not cleaned up when/if a CRD disappears, would be good to address this too.


if err := osClient.AssignUserRole(log, "admin", userID, serviceProject.ID); err != nil {
return fmt.Errorf("failed to assign admin role: %w", err)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cache this to say that the user was already created successfully ? Since this is being called as part of reconcileMCPServerWithOpenStack it will talk to keystone on everyr reconcile. Feels unnecessary

Watches(
&apiextensionsv1.CustomResourceDefinition{},
handler.EnqueueRequestsFromMapFunc(r.NotifyAllOpenStackLightspeeds),
builder.WithPredicates(predicate.ResourceVersionChangedPredicate{}),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we filter this ? And watch for openstackcontrolplanes.core.openstack.org specially ? Right now we are watching all CRDs in the cluster

builder.WithPredicates(predicate.ResourceVersionChangedPredicate{}),
).
Watches(
clusterVersion,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follwo-up comment L407 (below)

I guess we do the same here, but I think clusterVersion is just one in the cluster but CRDs we can have many.

@lpiwowar lpiwowar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the PR 👍 . Thank you! I like the KeystoneApplicationCredentials part.

}

// OpenStackLightspeedChecksumAnnotation is the annotation key used to store the checksum of resources.
const OpenStackLightspeedChecksumAnnotation = "openstack.org/checksum"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (non-blocking): This can be moved to constants.go.

// cache-based watch only covers the operator's namespace, so
// periodic reconciliation discovers OSCP instances elsewhere.
// Once OpenStack is detected and configured, the watch handles updates.
oscpWatch := r.DynamicWatchCRD[OpenStackControlPlaneGVK()]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): I know that there is a dependency between OpenStackControlPlane and KeystoneApplicationCredential CRDs. Meaning KeystoneApplicationCredential can not exist without OpenStackControlPlane and vice versa but it is not obvious here.

I think this part should be generic. What I'm trying to say is that there should be a for loop over r.DynamicWatchCRD and result.RequeueAfter setting should be skipped only when all dynamically watched CRDs are present.

limits:
cpu: 500m
memory: 128Mi
memory: 256Mi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (non-blocking): I'm just curious why exactly is this needed. What error was ecnountered?

VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: CombinedCABundleSecretName,
Optional: toPtr(true),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (blocking): Why do we need to set Optional == true? I think all of the resources are needed for proper functioning of the MCP server. By setting to true we are hiding errors related to missing clouds.yaml / secure.yaml or tls-ca-bundle.pem in the MCP servers logs. If we keep it set to false it will be immediately obvious what is wrong if the MCP server fails to deploy due to missing creds (or any other file ..)

KeystoneApplicationCredentialVersion = "v1beta1"
KeystoneApplicationCredentialKind = "KeystoneApplicationCredential"

// Lightspeed Service User

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (non-blocking): The comment could be a little bit more descriptive. Hard to tell just from the constants name what they are related to.

Comment on lines +7 to +15
#!/bin/bash
set -euo pipefail
oc wait --for=condition=Established \
crd/openstackcontrolplanes.core.openstack.org \
crd/keystoneapplicationcredentials.keystone.openstack.org \
--timeout=60s
oc get namespace openstack
oc wait --for=condition=Ready pod/mock-keystone-server -n openstack --timeout=60s
oc get openstackcontrolplane openstack-galera-network-isolation -n openstack

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (blocking): Why this can not be a standard Kuttl assert using YAML manifests?

Comment on lines +6 to +65
#!/bin/bash
set -euo pipefail

echo "Waiting for password secret to be created..."
for i in $(seq 1 60); do
if oc get secret lightspeed-password -n openstack 2>/dev/null; then
echo "Password secret found"
break
fi
if [ "$i" -eq 60 ]; then
echo "ERROR: Password secret not created after 120s"
exit 1
fi
sleep 2
done

PW=$(oc get secret lightspeed-password -n openstack \
-o jsonpath='{.data.password}' | base64 -d)
if [ -z "$PW" ]; then
echo "ERROR: Password secret has no password key"
exit 1
fi
echo "Password secret has password key (length: ${#PW})"

echo "Waiting for AC CR to be created..."
for i in $(seq 1 60); do
if oc get keystoneapplicationcredentials.keystone.openstack.org lightspeed \
-n openstack 2>/dev/null; then
echo "AC CR found"
break
fi
if [ "$i" -eq 60 ]; then
echo "ERROR: AC CR not created after 120s"
exit 1
fi
sleep 2
done

# Verify AC CR spec
USERNAME=$(oc get keystoneapplicationcredentials.keystone.openstack.org lightspeed \
-n openstack -o jsonpath='{.spec.userName}')
if [ "$USERNAME" != "lightspeed" ]; then
echo "ERROR: AC CR userName is '$USERNAME', expected 'lightspeed'"
exit 1
fi

SECRET_REF=$(oc get keystoneapplicationcredentials.keystone.openstack.org lightspeed \
-n openstack -o jsonpath='{.spec.secret}')
if [ "$SECRET_REF" != "lightspeed-password" ]; then
echo "ERROR: AC CR secret is '$SECRET_REF', expected 'lightspeed-password'"
exit 1
fi

EDPM_SVC=$(oc get keystoneapplicationcredentials.keystone.openstack.org lightspeed \
-n openstack -o jsonpath='{.metadata.annotations.keystone\.openstack\.org/edpm-service}')
if [ "$EDPM_SVC" != "false" ]; then
echo "ERROR: AC CR edpm-service annotation is '$EDPM_SVC', expected 'false'"
exit 1
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (blocking): Why this can not be a standard Kuttl assert using YAML manifests? To me it seems like there is nothing preventing it.

Comment on lines +8 to +14
oc delete openstackcontrolplane openstack-galera-network-isolation -n openstack --ignore-not-found=true
oc delete pod mock-keystone-server -n openstack --ignore-not-found=true
oc delete svc mock-keystone-server -n openstack --ignore-not-found=true
oc delete configmap mock-keystone-code openstack-config-mock -n openstack --ignore-not-found=true
oc delete secret openstack-config-secret-mock combined-ca-bundle-mock ac-lightspeed-test-secret -n openstack --ignore-not-found=true
oc delete namespace openstack --ignore-not-found=true --timeout=60s
oc delete crd openstackcontrolplanes.core.openstack.org keystoneapplicationcredentials.keystone.openstack.org --ignore-not-found=true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (blocking): Why can't we use?

---
apiVersion: kuttl.dev/v1beta1
kind: TestStep
delete:

Comment on lines +8 to +17
set -euo pipefail
! oc get pod mock-keystone-server -n openstack 2>/dev/null
! oc get svc mock-keystone-server -n openstack 2>/dev/null
! oc get configmap mock-keystone-code -n openstack 2>/dev/null
! oc get configmap openstack-config-mock -n openstack 2>/dev/null
! oc get secret openstack-config-secret-mock -n openstack 2>/dev/null
! oc get secret combined-ca-bundle-mock -n openstack 2>/dev/null
! oc get namespace openstack 2>/dev/null
! oc get crd openstackcontrolplanes.core.openstack.org 2>/dev/null
! oc get crd keystoneapplicationcredentials.keystone.openstack.org 2>/dev/null

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (blocking): Why can't we use the standard -errors.yaml file to verify resources are missing in a cluster?

helper *common_helper.Helper,
namespace string,
) error {
rawClient, err := getRawClient(helper)

@lpiwowar lpiwowar Jun 29, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can't we go without the getRawClient here (+ other places where it got introduced as part of this PR)? I do not like how the getRawClient is slowly spreading across the operator's code. I think if the customer sets the OperatorGroup correctly then WATCH_NAMESPACE will be set to openstack-operators / openstack and there is no reason to use the raw client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants